-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(profiles): rollout data compression #92133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
|
Files with missing lines | Patch % | Lines |
---|---|---|
src/sentry/profiles/consumers/process/factory.py | 50.00% | 3 Missing |
Additional details and impacted files
@@ Coverage Diff @@
## master #92133 +/- ##
===========================================
+ Coverage 40.47% 87.86% +47.39%
===========================================
Files 10145 10179 +34
Lines 582238 583484 +1246
Branches 22627 22627
===========================================
+ Hits 235649 512690 +277041
+ Misses 346137 70342 -275795
Partials 452 452
4ee36cf
to
89cef9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
|
||
if random.random() < options.get("taskworker.try_compress.profile_metrics"): | ||
if random.random() < options.get("taskworker.try_compress.profile_metrics.rollout"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this, but in the future you can use sentry.options.rollout.in_random_rollout()
for these kinds of checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's good to know, thank you!
b64encoded_compressed = b64encode( | ||
zlib.compress( | ||
message.payload.value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the drive-by review after the fact:
Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred?
And is the base64 encoding a hard requirement because the tasks can’t handle bytes
arguments?
It is just really weird that this end up being base64(zlib(base64(msgpack)))
.
We could just make this zstd(msgpack)
, or base64(zstd(msgpack))
in case we really can’t have bytes
.
base64 inflates the payload size by 33% by definition, and its a bit wasteful to do it twice even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred?
zlib was in stdlib, and I didn't know we already had the necessary dependencies for zstandard.
It is just really weird that this end up being base64(zlib(base64(msgpack))).
The current implementation doesn't have a double base64 encode. We do base64 twice so that we can measure the effects of compression, but the task payload should only be base64 encoded once after compression.
And is the base64 encoding a hard requirement because the tasks can’t handle bytes arguments?
Yes, bytes
aren't JSON encodable, so we needed a way to get a str.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for clarifying. I got confused by the double-base64, re-reading this again, its clear there is no double-encoding going on :-D
No description provided.